Data Lake Development with Big Data by Pradeep Pasupuleti & Beulah Salome Purra

Data Lake Development with Big Data by Pradeep Pasupuleti & Beulah Salome Purra

Author:Pradeep Pasupuleti & Beulah Salome Purra [Pasupuleti, Pradeep]
Language: eng
Format: azw3
Publisher: Packt Publishing
Published: 2015-11-26T05:00:00+00:00


Addressing the limitations using Data Lake

Data Lake addresses these constraints by providing the capability to follow a write-once-run-anywhere development paradigm. This paradigm ensures that you design, code, and test your Integration data flow only once. It abstracts the underlying hardware configuration details from the development process. Once the Integration data flow has been deployed, it can be seamlessly ported onto a grid compute environment with any number of nodes. The integration data flow doesn't have to be recompiled or reconfigured in cases where the compute environment is scaled up or down due to the demand that the data places.

This approach ensures that the Data Lake scores better in terms of overall Data Integration process execution time, as all the hardware resources are effectively utilized to crunch data. This approach also draws a clear boundary that demarcates hardware configuration's effect on the ability to run code, without recompiling or reconfiguring for every change in the hardware. Hence, this fundamental liberty to literally write-once-run-anywhere gives Data Lake the ability to provide scalability on-demand seamlessly.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.